The Prediction of a 3-Protein-Based Model on the Prognosis of Head and Neck Squamous Cell Carcinoma

Background Head and neck squamous cell carcinoma (HNSCC) is one of the commonest malignant tumors. Using high-throughput genomic methods, RNA-based diagnostic and prognostic models for HNSCC with potential clinical value have been developed. However, the clinical utility and reproducibility of these models are uncertain. Because the complex regulatory processes occurring after mRNA is transcribed, the abundance of proteins in a cell can never be fully predicted or explained by their corresponding mRNA expression. We aimed to assume and verify a novel protein signature for checking the HNSCC patients' prognosis. Methods The functional proteomic data of 332 HNSCC cases were collected from The Cancer Proteome Atlas (TCPA), and the related follow-up and clinical data were acquired from The Cancer Genome Atlas (TCGA). This study adopted multivariate and univariate Cox regression analysis, Akaike Information Criterion, receiver operating characteristic (ROC) analysis, and Kaplan-Meier method. Results Patients' clinical features in both sets were comparable (all, P > 0.05). The area under the ROC curve (AUC) for the 3-protein signature (X4EBP1_pT37T46, HER3_pY1289, and NF2) in the test set was 0.655 and in the combined cohort (all 332 patients combined) was 0.699. In addition, the 3-protein signature exhibited better predictive value for the survival of HNSCC patients as in comparison with conventional clinical factors like age, gender, tumor stage, and smoking history (TNM stage). Conclusion The 3-protein signature developed in this study exhibits good performance in predicting the overall survival of with HNSCC patients. The 3-protein signature exhibited better predictive value for survival than conventional clinical factors just like gender, TNM stage, smoking history, and age.


Introduction
Head and neck squamous cell carcinoma (HNSCC) is a malignancy originating from the oropharynx, hypopharynx, oral cavity, and larynx. More than 550,000 persons worldwide are diagnosed with HNSCC annually, resulting in 380,000 deaths [1]. Epidemiological studies have indicated HNSCC's incidence is growing, and the 5-year survival rate is <50% despite advances in treatments such as surgery, radiation therapy, and chemotherapy [2][3][4][5]. The survival rate is <1 year in locally advanced HNSCC patients who develop metastases or relapse [6]. Alcohol consumption, human papillomavirus (HPV) infection, and smoking are related to the occurrence, progression, and prognosis of HNSCC [7]. However, the reliability of these risk factors is unclear [8]. HNSCC associated with tobacco use and HPV have been shown to have different molecular signatures, complicating the use of molecular tech-niques to predict survival and develop targeted treatments [9]. Because of the molecular heterogeneity and etiological complexity of HNSCC, it is difficult to determine novel biomarkers that can help prognosis prediction and therapy guidance [8,10]. Using high-throughput genomic methods, RNA-based models with potential values clinically have been developed for the prognosis and diagnosis of HNSCC [11][12][13][14]. However, the clinical utility and reproducibility of these models are uncertain [15]. The modified proteome represents the final result of different molecular pathways and has the potential for the therapeutic targeting of malignancies. However, due to the complex regulatory processes occurring after mRNA is transcribed, the abundance of proteins in a cell can never be fully predicted or explained by their corresponding mRNA expression [16]. As such, proteomic analysis of tumors can provide researchers with large amounts of bioinformatics data different from that obtained by RNA or DNA sequencing.
In recent years, protein-based prognostic signature models have been developed to predict cancer survival. For example, Xie et al. [17] developed a 3-protien predictive risk score model for high-grade serous ovarian cancer's progression-free survival (PFS) and overall survival (OS). Han et al. [18] identified 4 protein biomarkers that are prognostic for kidney renal clear cell carcinoma. Patil and Mahalingam [19] successfully predicted lower-grade glioma patients' survival using a 4-protein prognostic signature.
The Cancer Proteome Atlas (TCPA) is an open-access bioinformatics resource that belongs to The Cancer Genome Atlas (TCGA) Project [20,21]. It contains protein expression data of many tumor cell lines formed by reverse-phase protein arrays (RPPAs) [20,21]. In this paper, a novel protein signature was constructed and checked for determining the prognosis of HNSCC patients using the functional proteomic data collected from TCPA.

Patients and Proteomic
Data. The functional proteomic data of 347 HNSCC patients were obtained from TCPA online database (http://tcpaportal.org), and TCGA (https:// cancergenome.nih.gov/) provided corresponding clinical and follow-up data. Upon removing fragmentary clinical follow-up records, this article enrolled the data of 332 cases.
The 332 patients were grouped randomly as training set (n = 168) and test set (n = 164), with the aim of comparability of variables in the 2 sets. The prognostic model was developed using the training set and verified via the test set.

Survival Analysis Based on the Functional Proteomic
Data in TCPA Database. Candidate proteins were selected from the functional proteomic data using the Kaplan-Meier method and univariate Cox proportional hazards regression analysis in the survival R package software    3 Computational and Mathematical Methods in Medicine differences between the low-and high-expression groups were examined with the 2-sided log-rank test. Only proteins with a value of P < 0:05 were considered candidate proteins.

Definition of Protein-Related Prognostic Model and Risk
Score. Based on the above method, 7 proteins were chosen to be candidates and received a multivariate Cox regression analysis to identify the preferred mathematical model with the Akaike Information Criterion (AIC). A predictive model by AIC has the best informative efficacy and goodness of fit. After the multivariate Cox regression analysis, the risk score of each patient was calculated by a formula: Survival Risk Score = ∑ n k=1 ðC k × V k Þ. Specifically, n is prognostic proteins' number; C K is the Kth protein's coefficient in the multivariate Cox regression analysis; and V k represents the Kth protein expression value. Proteins were considered to have a high-risk signature (C K > 0) and a low-risk signature (C K < 0). All functional proteomic data were analyzed using the R package software version 3.6.3.

Risk Stratification and Survival
Curve. Based on the calculated risk score, the 168 patients were pigeonholed as low-risk (< median score) and high-risk (> median score) groups. With the Kaplan-Meier method and R software, an OS curve was generated. And the survival time differences were compared by the log-rank test.
We also developed 3 survival curves of the low-and high-expression groups that were based on the final 3 proteins included in the predictive model. Finally, risk curves, survival maps, and heat maps were plotted to show the risk score's distributions of each protein for training set patients.

Independent Analysis of Prognosis and Comparison of
Receiver Operating Characteristic (ROC) Curves. To appraise clinical factors' prognostic ability (age, gender, disease stage, and smoking history) and the risk score, the multivariate and univariate Cox regression analyses were conducted using survival state and time as the dependent variables; and P < 0:05 was considered that the factors had independent prognostic values.
Besides, the ROC curve analysis was employed for evaluating the performance of the prognostic model and the clinical parameters, and the R Survival ROC package was used for drawing and analyzing the ROC curve. The

Validation in the Testing Set and in Combined Cohorts.
Based on the results obtained with the training set, we calculated the 164 patients' risk scores in the test set. The subjects were partitioned as low-and high-risk groups in the light of the median score. In addition, this process was also carried out in all 332 patients combined (combined cohort). The Kaplan-Meier survival curves of the testing set and that of the combined testing and training set were plotted, and survival differences between the low-and high-risk groups were compared via the log-rank test. And the model's prognostic value was estimated by the AUC of both ROC curves.

Protein Coexpression Analysis and the Sankey Diagram.
To identify the potential proteins correlated with the 3 5 Computational and Mathematical Methods in Medicine proteins in the model, proteins identified in the functional proteomic data whose expressions were significantly correlated with the proteins in the predictive model were identified using 2-sided Pearson's correlation coefficient analysis and the Z-test. Proteins with an absolute Pearson's correlation coefficient value of >0.4 and P value < 0.001 were considered to have positive or negative correlation with the 3 proteins in the prognostic model. A Sankey diagram was plotted using the "ggalluvial" R software package to illustrate the potential correlations of the proteins. Table 1 presents 332 HNSCC cases' data clinically in the testing (n = 164) and training (n = 168) sets. In these cases, 200 suffered Stage IV, 61 Stage III, 57 Stage II, and 14 Stage I disease. Patients were randomly divided into the testing set (n = 164) and training set (n = 168). Few obvious differences were observed in variables clinically (e.g., age, gender, TNM stage, survival time, and survival status) between the two sets (all, P > 0:05) ( Table 1).   Figure 1 (volcano plot), there were 8 proteins that were defined as low risk and 16 proteins that were defined as high risk. The 24 proteins' prognostic values were determined via the univariate Cox regression analysis (all, P < 0:05). Then we conducted the Kaplan-Meier analysis, and 7 proteins were selected as candidate proteins to build a prognostic model ( Table 2).

The 3-Protein Signature
Constructed from X4EBP1_ pT37T46, HER3_pY1289, and NF2 Was Established by the Multivariate Cox Regression Analysis. A 3-protein prognostic model was established by 3 of the 7 proteins selected with the stepwise multivariate Cox regression analysis. The 3 proteins selected were X4EBP1_pT37T46, HER3_pY1289, and NF2. The predictive model was based on the summed expressions of the 3 proteins weighted by their relative coefficients. The relative coefficients were calculated using the multivariate Cox regression and represented each protein's risk degree (Table 3). The multivariate survival analysis outcomes using the 3 proteins are shown in Figure 2. Every patient's survival risk score was calculated through the formula: Survival Risk Score = ð−0:544877895 × X4EBP1 pT37T46 expression valueÞ + ð1:016464597 × HER3 pY 1289 expression valueÞ + ð1:122403466 × NF2 expression valueÞ. Of the 3 proteins, the coefficient of X4EBP1_ pT37T46 was negative in the Cox regression analysis indicating it is protective since high expression is associated with longer OS. Conversely, the coefficients of the other 2 proteins (HER3_pY1289 and NF2) were positive and thus were considered risk factors because higher expression of the 2 proteins meant shorter OS.

The 3-Protein Signature Can Predict the Survival of HNSCC Patients.
First, 3 survival curves of the high-and low-expression groups on basis of the expression of the 3 proteins in the predictive model were developed (Figures 3(a)-3(c)). The Kaplan-Meier survival curves of the 2 groups based on the 3 proteins' expression were significantly different (P value, log-rank test).
Next, with the median risk score described previously as a standard, training set's patients were divided into a highrisk group and a low-risk group. Survival analysis indicated a great difference in the high-and low-risk groups' survival time, further confirming the prognostic effectiveness of the 3-protein signature (Figure 3(d)). The risk curve, survival map, and heat map of the 3-protein signature are shown in Figure 4. As shown in Figures 4(a)-4(b), the deaths in the high-risk areas were obviously larger than those in the lowrisk areas. As shown in Figure 5(c), the expression patterns of the 3 proteins were correlated with risk scores.   The OS had significant association with the risk score and N stage, and the 3-protein signature risk score and N stage were both independent predictors of survival. To compare the 3protein signature risk score and the clinical factors' prognostic power, ROC curves of each independent variable were plotted, and the AUCs were calculated (Figure 6). The results showed greater AUC of the 3-protein signature (0.750) than the AUC of N stage (0.624) in the training set, indicating that the 3-protein signature exhibited better sensitivity and specificity in predicting survival. Taken together, these results indicate that the 3-protein signature exhibits better predictive value for survival of HNSCC cases (hazard ratio ðHRÞ = 1:471, 95% confidence interval (CI): 1.255-1.726, P < 0:0001, Figure 5(b)), as compared with conventional clinical factors like age, sex, smoking history, and TNM stage. Consistent with results of the training set, differences (P < 0:05) of OS were statistically significant between the low-and the high-risk groups in the testing set and combine cohort. The AUC for the 3-protein signature in the testing set was 0.655 (Figure 7(b)) and in the combined cohort was 0.699 (Figure 7(d)), suggesting good performance of the 3-protein signature for predicting OS.  Figure 8. Thus, the 12 proteins may be related to HNSCC's prognosis.

Discussion
In this study we identified 3 proteins (X4EBP1_pT37T46, HER3_pY1289, and NF2) related to HNSCC patients' survival and developed a model using the 3 proteins for predicting their OS. A training set was used to develop the model, and the model was validated with a testing set. The AUC for the 3-protein signature in the testing set was 0.655 and in the combined cohort was 0.699, indicating great performance of the 3-protein signature in the OS prediction of HNSCC patients. In addition, the 3-protein signature exhibited better predictive value for survival of HNSCC patients as compared with conventional clinical factors (age, sex, smoking history, and TNM stage). HNSCC is a relatively common malignancy and is very common in certain parts of the world [22]. Although there have been many advances in understanding of the molecular biology of HNSCC [1,4,[7][8][9], as well as treatment options, the mortality of patients with HNSCC remains high. As such, there is a need for the development of novel markers to predict prognosis and help guide treatment.
Bioinformatics studies have screened molecular biomarkers such as mRNA, miRNA, and lncRNA to predict the prognosis for HNSCC patients [11,12,23]. Advances in high-throughput proteomics techniques allow the quantitative assessment of large numbers of proteins in multiple specimens. As an antibody-based protein microarray dotblot platform, the reverse-phase protein array (RPPA) allows a large number of biological samples' quantitative measurement in protein expression level simultaneously as antibodies with high quality are available [24][25][26]. Many  Computational and Mathematical Methods in Medicine studies have used the RPPA technique to study protein biomarkers relevant to cancer progression, treatment selection, and prognostic prediction [19,27].
With major advances in bioinformatics, proteomics, and techniques of gene analysis, a great deal of researchers has contributed themselves to developing signatures using different methods for predicting the prognosis of patients with head and neck cancer. Prognostic signatures have been developed using miRNA [28,29], alternative splicing [30], immune function molecules [31], and a signature according to m 6 A RNA methylation regulators [32].
In a study similar to ours, Zhao et al. [33] reported a 5protein signature for predicting HNSCC prognosis. Notably, the OS was much worse in patients with high-risk scores than that in those with low-risk scores in the subgroups of male sex, tumor grade 1-2, age < 60 years, and disease Stages III-IV. OS differences were not significant in patients in the subgroups of female sex, age ≥ 60 years, tumor grade 3-4, and disease Stages I-II. In other notable research, Jin et al. [34] reported that p53-targeted lncRNA-p21 serves as a tumor suppressor through suppressing JAK2/STAT3 signaling pathways in HNSCC. Zhang et al. [14] developed a model using 5 genes as a novel signature for the prognosis prediction of people with laryngeal cancer (KLHDC7B, MMP1, DPY19L2P1, HOXB9, and EMP1). The ROC curve analysis suggested good effect of the 5-gene signature on predicting laryngeal cancer prognosis (AUC = 0:862, P < 0:05). Guo et al. [23] reported a 6-mRNA (ZNF324B, YIPF4, TMC8, PDGFA, PCMT1, and FRMD5) signature model for determining HNSCC prognosis. The AUC of the model for predicting OS was 0.745 (P < 0:001). Wang et al. [35] recently reported that 3 microRNAs (has-miR-1911, has-miR-499a, and has-miR-99a) were independent risk factors significantly related to patients with head and neck cancer in survival (all, P < 0:01). In addition, GO and KEGG analyses presented the association of cancer prognosis with the JAK STAT signaling pathway and certain metabolic pathways. In a unique study, You et al. [36] used cDNA microarrays and bioinformatics methods to study radioresistance in head and neck carcinoma and identified 4 key functional pathways and molecular markers that greatly promoted radio-resistance. A recent report by Ribeiro et al. [37] studied tumor specimens of 40 patients with HNSCC undergoing tumor resection, and tumoradjacent tissues from 32 of the patients. The authors identified a proteomic signature based on 3 proteins (DHB12, HMGB3, and COBA1) and developed a model that included the 3 proteins and tumor stage that exhibited >80% predictive accuracy for the development of metastasis and recurrence.
This study's primary demerit is that the analysis was based on information contained in large databases. While this method provides important information and we were able to develop a protein signature predictive of the OS of patients with HNSCC, clinical validation of the results was not performed. Clinical validation of the results was not part of the research design and hence was not performed. While the results are compelling, they need to be verified through clinical study of HNSCC patients.

Conclusion
In this report, we developed a 3-protein signature to predict HNSCC patients' survival. The AUC for the 3-protein signature in the testing set was 0.655 and in the combined cohort was 0.699, indicating the favorable role of the 3-protein signature in HNSCC patients' OS prediction. In addition, the 3protein signature exhibits better predictive value for survival of HNSCC patients as compared with conventional clinical factors like gender, smoking history, age, and TNM stage. These results add relevant information to the medical literature to help guide the management of patients with HNSCC.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.